Method, computer program, system, and communication device for optimizing the capacity of communication channels

ABSTRACT

The invention relates to a method for optimizing a capacity of a communication channel in a communication system comprising at least a transmitter ( 10 ), a receiver ( 11 ), and the communication channel ( 12 ) between the transmitter and the receiver. The transmitter ( 10 ) uses a finite set of symbols Ω={ω 1 , . . . , ω N } having respective positions on a constellation, to transmit a message including at least one symbol on said communication channel ( 11 ). The communication channel ( 11 ) is characterized by a conditional probability distribution ρ Y|X (y|x), where y is the symbol received at the receiver ( 12 ) while x is the symbol transmitted by the transmitter. More particularly, the conditional probability distribution ρ Y|X (y|x) is obtained, for each possible transmitted symbol x, by a mixture model using probability distributions represented by exponential functions. An optimized input distribution p x (x) is computed, based on parameters of the mixture model, to define optimized symbols positions and probabilities to be used at the transmitter for optimizing the capacity of the channel.

TECHNICAL FIELD

The present invention relates to the field of telecommunications andmore particularly targets the problem of optimizing the capacity ofcommunication channels.

BACKGROUND ART

The optimization can be implemented by computer means and moreparticularly for example by artificial intelligence, and can be based onobservations of whether transmitted messages via communication channelsfrom a transmitter are well received or not at a receiver.

Especially, the case of mixture channels where the optimal inputdistribution cannot be obtained theoretically, is difficult to address.The probability distribution can be decomposed on a functional basis.

The essential characteristics of a memoryless communication channel canbe represented by the conditional probability distribution p_(Y|X)(y|x)of the output Y given the input X. Some examples of well-knowncommunication channels are given below:

-   -   The additive white Gaussian noise channel, where Y=X+η and η is        Gaussian-distributed models corresponds to a wired communication        subject to perturbations caused by the thermal noise at the        receiver,    -   The fading channel Y=α.X+η where a follows a fading distribution        (such as a Rayleigh distribution), models the transmission over        a narrow-band wireless channel in a radio propagation        environment involving rich scattering,    -   More complicated channels can involve non-linear effects. This        is for example the case of optical channels where the Kerr        effect cannot be neglected and reduces the channel capacity,        which is driven by the Nonlinear Schrödinger equations, when        increasing too much the transmit power in Wavelength Division        Multiplexing. (WDM transmissions).

Once the conditional probability distribution p_(Y|X)(y|x) is accuratelyknown, it is possible to optimize the communication system relying on:

-   -   The design of the input signal such as to maximize the mutual        information between the input and the output of the channel,    -   The design of optimal receivers, which in general relies on the        processing of the likelihood probabilities p_(Y|X)(y|x).

It is still needed a solution to the problem of optimizing thetransmission strategy preferably by designing the probability oftransmission and optionally the position of each symbol of aconstellation, typically (QAM, PSK, etc. for example). The mainchallenges are:

-   -   Get the channel conditional probability distribution of the        channel, decomposed on a functional basis.    -   Optimize the input distribution.

The present invention aims to improve the situation.

SUMMARY OF INVENTION

To that end, it proposes a method for optimizing a capacity of acommunication channel in a communication system comprising at least atransmitter, a receiver, and the communication channel between thetransmitter and the receiver, the transmitter using a finite set ofsymbols Ω={ω₁, . . . , ω_(N)} having respective positions on aconstellation, to transmit a message including at least one symbol onsaid communication channel, and the communication channel beingcharacterized by a conditional probability distribution p_(Y|X)(y|x),where y is the symbol received at the receiver while x is the symboltransmitted by the transmitter.

More particularly, the aforesaid conditional probability distributionp_(Y|X)(y|x) is obtained, for each possible transmitted symbol x, by amixture model using probability distributions represented by exponentialfunctions, and an optimized input distribution p_(x)(x) is computed,based on parameters of said mixture model, to define optimized symbolspositions and probabilities to be used at the transmitter for optimizingthe capacity of the channel.

Therefore, the decomposed representation of the channel conditionalprobability distribution, in a basis of exponential distributionfunctions, is used in order to limit the computational complexity ofcomputing the optimal input distribution. By improving the input signalprobability distribution according to the channel knowledge, the channelcapacity is thus highly improved.

The aforesaid optimized symbols positions and probabilities can beobtained at the transmitter, but also at the receiver as well, in aparticular embodiment.

In an embodiment, the transmitter can transmit messages conveyed by asignal belonging to a finite set of signals corresponding respectivelyto said symbols ω₁, . . . , ω_(N), each signal being associated with atransmission probability according to an optimized input signalprobability distribution corresponding to said optimized inputdistribution p_(x)(x). In this embodiment then, the transmitter takes:

-   -   messages to be transmitted, and    -   the optimized input signal probability distribution as inputs,        and outputs a transmitted signal on the communication channel.

In this embodiment, the communication channel takes the transmittedsignal as an input, and outputs a received signal intended to beprocessed at the receiver (in order to decode the received message atthe receiver, typically), the aforesaid conditional probabilitydistribution p_(Y|X)(y|x) being related thus to a probability ofoutputting a given signal y when the input x is fixed.

Preferably, in this embodiment, the conditional probability distributionp_(Y|X)(y|x) is defined on a continuous input/output alphabet, as aprobability density function.

An estimation of the conditional probability distribution p_(Y|X)(y|x)is taken as input, to output the optimized input signal probabilitydistribution p_(x)(x) to be obtained at the transmitter (and at thereceiver in an embodiment), the conditional probability distributionestimation being used then for computing the optimized input signalprobability distribution, the conditional probability distributionestimation being approximated by said mixture model.

In an embodiment, the receiver takes the received signal, and also theoptimized input signal probability distribution p_(x)(x), and anestimation of the channel conditional probability distributionp_(Y|X)(y|x) as inputs and performs an estimation of a message conveyedin said received signal.

Therefore, in this embodiment, the receiver can perform an enhanceddetermination of the conveyed message thanks to the optimized inputsignal probability distribution p_(x)(x), from which the channelconditional probability distribution p_(Y|X)(y|x) can be estimated.

In an embodiment, the aforesaid mixed model follows a conditionalprobability distribution p_(Y|X)(y|x) which is decomposable into a basisof probability distributions exponential functions g(y|x;θ) , where θ isa parameter set, such that:

p _(Y|X)(y|x)=Σ_(j=1) ^(K) w _(j) g(y|x;θ _(j))   (E)

where K is a predetermined parameter, the sets {θ_(j)},{w_(j)} areparameters representing respectively a mean vector coordinates andcovariance matrix parameters.

Moreover, in this embodiment, the derivative of the probabilitydistributions exponential functions g(y|x;θ) are more particularly givenby g(y|x;θ)=h(y,θ)exp(x^(T)y−α(x,θ)), where h(y,θ) is a function of yand θ, and α(x,θ) is the moment generating function, x and y beingvectors, such that said derivative is given by:

${\frac{\partial}{\partial x}{g\left( {\left. y \middle| x \right.;\theta} \right)}} = {{h\left( {y,\theta} \right)}\left( {y - {\frac{\partial}{\partial x}{a\left( {x,\theta} \right)}}} \right){\exp\left( {{x^{T}y} - {a\left( {x,\theta} \right)}} \right)}}$

The aforesaid distribution p_(Y|X)(y|x) can be approximated by a finiteset of continuous functions minimizing a metric defined byKullback-Leibler divergence, by determining parameters set{θ_(j)},{w_(j)} which minimize the Kullback-Leibler divergence betweenan analytical observation of p_(Y|X)(y|x) and its expression given by:

p _(Y|X)(y|x)=Σ_(j=1) ^(K) w _(j) g(y|x;θ _(j)).

The input distribution p_(x)(x) can be represented as a list of Nconstellation positions as {(x₁,π₁), . . . , (x_(N), π_(N))}, wherex_(i) and π_(i) denote respectively constellation positions andprobability weights,

And the input distribution p_(x)(x) is estimated by solving anoptimization problem at the transmitter given by:

$\left( {x^{*},\pi^{*}} \right) = {\underset{x,\pi}{argmax}{I\left( {x,\pi} \right)}}$${{subject}{to}{\sum_{i = 1}^{N}\pi_{i}}} = 1$${\sum_{i = 1}^{N}{{❘x_{i}❘}^{2}\pi_{i}}} \leq P$0 < π_(i) < 1, fori = 1, …, N

Where:

-   -   I(X,π) is a mutual information as a function of position vector        x=[x₁, . . . , x_(N)]^(T) and weight vector π=[π₁, . . . ,        π_(N)]^(T),    -   optimal values are tagged with the superscript *, and    -   P denotes a total transmit power.

The mutual information can be expressed as:

${{I\left( {x,\pi} \right)} = {\frac{1}{M}{\sum_{i = 1}^{N}{\sum_{m = 1}^{M}{\pi_{i}\log\frac{p_{Y|X}\left( y_{i,m} \middle| x_{i} \right)}{\sum_{j = 1}^{N}{\pi_{j}{p_{Y|X}\left( y_{i,m} \middle| x_{j} \right)}}}}}}}},{where}$${{p_{Y|X}\left( y \middle| x \right)} = {\sum_{j = 1}^{K}{w_{j}{g\left( {\left. y \middle| x \right.;\theta_{j}} \right)}}}},$

and argument y_(i,m) are samples from the distribution p_(Y|X)(y|x_(i)).

In this embodiment, an alternating optimization can be performediteratively to calculate both p_(x)(x) and p_(Y|X)(y|x)=Σ_(j=1) ^(K)w_(j)g(y|x;θ_(j)), so as to derive from said calculations optimizedpositions π^((t)) described from a preceding iteration t−1 to a currentiteration t as follows:

-   -   at first, positions π^((t)) are optimized for a fixed set of        symbol positions x^((t−1)) and previous position values        π^((t−1));    -   then, symbol positions x^((t)) are optimized for the thusly        determined π^((t)) and previous values of x^((t−1)),

And repeating iteratively these two steps until a stopping conditionoccurs on the mutual information I(x,π).

This embodiment is described in details below with reference to FIG. 2 ,showing an example of steps of an algorithm involving a Gradient descenton particles representing positions on a telecommunication constellation(QAM, PSK, or other).

The present invention aims also at a computer program comprisinginstructions causing a processing circuit to implement the method aspresented above, when such instructions are executed by the processingcircuit.

FIG. 2 commented below can illustrate the algorithm of such a computerprogram.

The present invention aims also at a system comprising at least atransmitter, a receiver, and a communication channel between thetransmitter and the receiver, wherein the transmitter at least isconfigured to implement the method above.

The invention aims also at a communication device comprising aprocessing circuit configured to perform the optimization method aspresented above.

BRIEF DESCRIPTION OF DRAWINGS

More details and advantages of possible embodiments of the inventionwill be presented below with reference to the appended drawings.

FIG. 1 is an overview of a system according to an example of embodimentof the invention.

FIG. 2 shows possible steps of an optimization method according to anembodiment of the invention.

FIG. 3 shows schematically a processing circuit of a communicationdevice to perform the optimization method of the invention.

DESCRIPTION OF EMBODIMENTS

Referring to FIG. 1 , a system according to the present inventioncomprises in an example of embodiment a transmitter 10, a receiver 12, atransmission channel 11 and an input signal probability distributionoptimizer 13.

The transmitter 10 transmits messages conveyed by a signal belonging toa finite set of signals, each associated with a transmission probabilityaccording to an (optimized) input signal probability distribution. Thetransmitter 10 takes the messages and the (optimized) input signalprobability distribution as inputs, and outputs the signal to betransmitted on the channel. The channel 11 takes the transmitted signalas an input, and outputs a received signal which is processed at thereceiver 12 in order to decode the transmitted message. It ischaracterized by a channel conditional probability distribution of theprobability of outputting a given signal when the input is fixed. Theprobability distribution can generally be defined on a discrete orcontinuous input and/or output alphabet. Here, as an example, thecontinuous output alphabet is considered, and the probabilitydistribution is called a probability density function in this case.

The input signal probability distribution optimizer 13 takes theconditional probability distribution estimation as an input, and outputsthe optimized input signal probability distribution to the transmitter10 and receiver 12.

It is worth noting here that the optimizer 13 can be a same module whichis a part of both the transmitter and the receiver. It can bealternatively a module which is a part of a scheduling entity (e.g. abase station or other) in a telecommunication network linking saidtransmitter and receiver through the communication channel. Moregenerally, a communication device such as the transmitter 10, thereceiver 12, or else any device 13 being able to perform theoptimization method, can include such a module which can have inpractice the structure of a processing circuit as shown on FIG. 3 . Sucha processing circuit can comprise typically an input interface IN toreceive data (at least data enabling the estimation of the conditionalprobability distribution), linked to a processor PROC cooperating with amemory unit MEM (storing at least instructions of a computer programaccording to the invention), and an output OUT to send results ofoptimization computations.

More particularly, the conditional probability distribution estimationis used for computing the optimized input signal probabilitydistribution at the input signal probability distribution optimizer 13.In particular, it is shown hereafter that the optimization is made moreefficient when the conditional probability distribution estimation isapproximated by a mixture of exponential distributions.

The receiver 12 takes the received signal, the optimized input signalprobability distribution and the estimated channel conditionalprobability distribution as inputs and performs an estimation of themessage conveyed in the received signal.

The transmission channel 11 is represented by a model, hereafter, thatfollows a conditional probability distribution p_(Y|X)(y|x) that can bedecomposed into a basis of probability distributions functions p(y|x;θ),where θ is a parameter set. For example, the distribution function isthe exponential family and the parameters are essentially the mean andvariance for the scalar case, and more generally the mean vector andcovariance matrix for the multi-variate case, such that:

p _(Y|X)(y|x)=Σ_(j=1) ^(K) w _(j) p(y|x;θ _(j))   (E)

where K, and the sets {θ_(j)},{w_(j)} are parameters.

For example, three examples of channels following the model can be citedhereafter.

Channels might have random discrete states when the channel fluctuatesrandomly in time according to discrete events, such as:

-   -   interference by bursts that changes the signal to noise ratio        from one transmission to another,    -   shadowing effects that change the received signal power,    -   an approximation of a random fading channel by a discrete        distribution, where the channel coefficient α is random and        follows p(α)=Σ_(j=1) ^(K) w_(j)δ(α−α_(j)), where α_(j) is one        out of n possible values of the channel coefficient α occurring        with a probability w_(j), and δ(.) is the Kronecker function.        Thus, in case of Gaussian noise with variance σ_(η) ², such        fading channel leads to the probability distribution is noted as

${p_{Y|X}\left( y \middle| x \right)} = {\sum\limits_{j = 1}^{K}{w_{j}\frac{e^{- \frac{{❘{y - {\alpha_{j}x}}❘}^{2}}{2\sigma_{\eta}^{2}}}}{\sqrt{2\sigma_{\eta}^{2}}}}}$

In case of channel estimation impairments (typically when thetransmission channel is imperfectly known), residual self-interferenceis obtained on the received signal. In general, the channel model isobtained as ={circumflex over (α)}x+η−vx, which leads to:

${p_{Y|X}\left( y \middle| x \right)} = {\sum\limits_{j = 1}^{K}{w_{j}\frac{e^{- \frac{{❘{y - {\hat{\alpha}x}}❘}^{2}}{2{({\sigma_{\eta}^{2} + {\sigma_{v}^{2}{❘x❘}^{2}}})}}}}{\sqrt{2\left( {\sigma_{\eta}^{2} + {\sigma_{v}^{2}{❘x❘}^{2}}} \right)}}}}$

Therefore, it is shown here that, from any known continuous distributionp_(Y|X)(y|X), this distribution can be approximated by a finite set ofcontinuous functions.

The approximation is done by minimizing a metric. One relevant metric isthe Kullback-Leibler divergence that allows getting a measure of thedifference between two distributions. Thus, when knowing p_(Y|X)(y|x)analytically, it is possible to find parameters set {θ_(j)},{w_(j)} thatminimize the Kullback-Leibler divergence between p_(Y|X)(y|x) and anapproximated expression in the form of equation (E) given above.

From an estimated histogram of p_(Y|X)(y|x), it can be approximated by afinite set of continuous functions, in the same way as with a knowncontinuous distribution, by using the Kullback-Leibler divergence as ametric.

The function p_(Y|X)(y|x) is bi-variate with variables x and y whichspans in general in a continuous domain.

Hereafter a focus is made on symbols x belonging to a finite alphabetΩ={ω₁, . . . , ω_(N)} of cardinality N.

It is further assumed that the derivative of the probabilitydistributions functions g(y|x;θ) is known. For example, when g(y|x;θ) isfrom the exponential family, it can be written:

g(y|x; θ)=h(y,θ)exp(x ^(T) y−α(x,θ)),

where h(y,θ) is a function of y and θ, and α(x,θ) is the momentgenerating function, x and y being vectors in this general case. Thus,

${\frac{\partial}{\partial x}{g\left( {\left. y \middle| x \right.;\theta} \right)}} = {{h\left( {y,\theta} \right)}\left( {y - {\frac{\partial}{\partial x}{a\left( {x,\theta} \right)}}} \right){\exp\left( {{x^{T}y} - {a\left( {x,\theta} \right)}} \right)}}$

For example, in the scalar Gaussian case, the probability densityfunction is thus decomposed as follows:

${\frac{\partial}{\partial x}\frac{e^{- \frac{{❘{y - {\alpha_{j}x}}❘}^{2}}{2\sigma_{\eta}^{2}}}}{\sqrt{2\sigma_{\eta}^{2}}}} = {\frac{2{\alpha_{j}\left( {y - {\alpha_{j}x}} \right)}}{2\sigma_{\eta}^{2}}\frac{e^{- \frac{{❘{y - {\alpha_{j}x}}❘}^{2}}{2\sigma_{\eta}^{2}}}}{\sqrt{2\sigma_{\eta}^{2}}}}$

The input signal distribution optimizer 13 relies on the estimation ofthe channel probability distribution in the form of equation (E). Whenthe functional basis chosen for the estimation of the channel is theexponential family, closed form expression can be derived and thealgorithm converges to the optimal solution.

The capacity approaching input is deemed to be discrete for somechannels. For the case of continuous capacity achieving input (that isthe case for more general channels), the input distribution p_(X)(x) canbe represented as a list of N particles as

-   -   {(x₁,π₁), . . . , (x_(N),π_(N))},        where x_(i) and π_(i) denote the positions (i.e., represented by        a set of coordinates or by a complex number in a 2-dimension        case) and weights, respectively. The optimization problem in the        transmitter can be written as

$\begin{matrix}{\left( {x^{*},\pi^{*}} \right) = {\underset{x,\pi}{argmax}{I\left( {x,\pi} \right)}}} & (1)\end{matrix}$ $\begin{matrix}{{{subject}{to}{\sum_{i = 1}^{N}\pi_{i}}} = 1} & (2)\end{matrix}$ $\begin{matrix}{{\sum_{i = 1}^{N}{{❘x_{i}❘}^{2}\pi_{i}}} \leq P} & (3)\end{matrix}$ $\begin{matrix}{{0 < \pi_{i} < 1},{{{for}i} = 1},\ldots,N} & (4)\end{matrix}$

Where:

-   -   I(x, π) is the mutual information as a function of position        vector x=[x₁, . . . , x_(N)]^(T) and weight vector π=[π₁, . . .        , π_(N)]^(T),    -   the optimal values are shown with the superscript *, and    -   P denotes the total transmit power constraint, which is set        arbitrarily. In general, this value is defined by a power budget        of the transmitter which is related to the physical limit of the        power amplifier or is related to a maximum radiated power        allowed by regulation.

The constraint (2) sets the total probability of particles to 1.Constraints (3) and (4) guarantee the total transmit power to be lessthan or equal to P, and the magnitude of particle probabilities to bepositive values less than 1, respectively. The mutual informationI({circumflex over (x)},π), involves an integration on continuous randomvariables, but can be approximated by Monte-Carlo integration (the mainprinciple of which is to replace the expectation function, which usuallyinvolves an integration, by a generation of samples which arerealizations of said random variable and an averaging of the obtainedvalues) as

$\begin{matrix}{{{I\left( {x,\pi} \right)} = {\frac{1}{M}{\sum_{i = 1}^{N}{\sum_{m = 1}^{M}{\pi_{i}\log\frac{p_{Y|X}\left( y_{i,m} \middle| x_{i} \right)}{\sum_{j = 1}^{N}{\pi_{j}{p_{Y|X}\left( y_{i,m} \middle| x_{j} \right)}}}}}}}},} & (5)\end{matrix}$

where M denotes the number of samples (i.e., the number of realizationsof the random variables generated from their probability distribution),and where

p _(Y|X)(y|x)=Σ_(j=1) ^(K) w _(j) g(y|x;θ _(j)),   (6)

denoting thus a decomposition of the conditional probabilityp_(Y|X)(y|x) into a basis of functions g() involving θ_(j).

The argument y_(i,m) in (5) are the samples from the distributionP_(Y|X)(y|x_(i)).

Hereafter, an alternating optimization method is proposed, describedfrom iteration t−1 to t as follows:

-   -   at first, optimize π^((t)) for a fixed set of particles        x^((t−1)) and a previous value π^((t−1));    -   then, optimize x^((t)) for the obtained π^((t)) and a previous        value x^((t−1)).

These two steps are detailed hereafter respectively as S1 and S2. Theycan intervene after an initialization step S0 of an algorithm presentedbelow.

Step S1: Optimization of π^((t)) for a Fixed Set of Particles x^((t−1))and a Previous Value π^((t−1))

The optimization in (1) is concave with respect to it for fixed valuesof x. So, for a given x^((t−1)), (1) is solved for it by writing theLagrangian and solving for π_(i) for i=1, . . . , N as

$\begin{matrix}{{\pi_{i}^{(t)} = \frac{\exp\left( {{\beta{❘x_{i}^{({t - 1})}❘}^{2}} + {\frac{1}{M}{\sum_{m = 1}^{M}{\log{q\left( x_{i}^{({t - 1})} \middle| y_{i,m} \right)}}}}} \right)}{\sum_{j = 1}^{N}{\exp\left( {{\beta{❘x_{j}^{({t - 1})}❘}^{2}} + {\frac{1}{M}{\sum_{m = 1}^{M}{\log{q\left( x_{j}^{({t - 1})} \middle| y_{j,m} \right)}}}}} \right)}}},{where}} & (7)\end{matrix}$${q\left( x_{i} \middle| y_{i,m} \right)} = {\frac{\pi_{i}^{({t - 1})}{p_{Y|X}\left( y_{i,m} \middle| x_{i} \right)}}{\sum_{j = 1}^{N}{\pi_{j}^{({t - 1})}{p_{Y|X}\left( y_{i,m} \middle| x_{j} \right)}}}.}$

Here, the expression

$\frac{1}{M}{\sum_{m = 1}^{M}{\log{q\left( x_{i}^{({t - 1})} \middle| y_{i,m} \right)}}}$

is the approximation of the mathematical expectation E[logq(x_(i)^((t−1))|y_(i))] according to the random variable y_(i). Theapproximation is performed by the above mentioned Monte-Carlointegration, i.e., by generating M samples according to the distributionof y_(i). The term

$\frac{1}{M}{\sum_{m = 1}^{M}{\log{q\left( x_{i}^{({t - 1})} \middle| y_{i,m} \right)}}}$

can be advantageously replaced by a numerical integration or a closedform expression when available.

In (7), β denotes the Lagrangian multiplier that can be determined byreplacing (7) in (3) with equality for the maximum total transmit powerP, and resulting to the non-linear equation

$\begin{matrix}{{\sum_{i = 1}^{N}{{\exp\left( {{\beta{❘x_{i}^{({t - 1})}❘}^{2}} + {\frac{1}{M}{\sum_{m = 1}^{M}{\log{q\left( x_{i}^{({t - 1})} \middle| y_{i,m} \right)}}}}} \right)}\left\lbrack {P - {❘x_{i}^{({t - 1})}❘}^{2}} \right\rbrack}} = 0.} & (8)\end{matrix}$

The non-linear equation (8) can be solved using different tools, e.g.,gradient descent based approaches such as Newton-Raphson, or byselecting several values of 16, computing the left part of the equationin (8) and keeping the closest one to 0 in absolute value. And thevalues of π_(i) ^((t)) are obtained from (7).

Step S2: Optimization of x^((t)) for a Fixed π^((t)) and Previousx^((t−1))

The Lagrangian for the optimization in (1) with a given weight vectorπ^((t)) can be given by:

(x;β,π ^((t)))=I(x,π ^((t)))+β(P−Σ _(i=1) ^(N) |x _(i)|²π_(i) ^((t))).  (9)

The position vector x is obtained such that the Kullback-Leiblerdivergence D(p_(Y|X)(y|x_(i))∥p_(Y)(y)) penalized by the second term in(9) is maximized. This way the value of Lagrangian

(x; β, π^((t)), i.e., penalized mutual information, is greater than orequal to the previous values after each update of the position andweight vectors. This is achieved by gradient ascent based methods, i.e.:

$x_{i}^{(t)} = \left. {x_{i}^{({t - 1})} + {\lambda_{t}{\frac{\partial}{\partial x_{i}}{D\left( {{p_{Y❘X}\left( y \middle| x_{i} \right)}{{p_{Y}(y)}}} \right)}}}} \right|_{x^{({t - 1})},\pi^{(t)}}$

where the step size λ_(t) is a positive real number.

In the aforementioned gradient ascent based methods, it is required tocompute the derivative of the term D(p_(Y|X)(y|x_(i))∥p_(Y)(y)) byMonte-Carlo integration as

$\left. {\frac{\partial}{\partial x_{i}}{D\left( {{p_{Y❘X}\left( y \middle| x_{i} \right)}{{p_{Y}(y)}}} \right)}} \middle| {}_{x^{({t - 1})},\pi^{(t)}}{\approx {\frac{1}{M}{\sum\limits_{m = 1}^{M}{{{{h\left( {y_{i,m},x_{i}^{({t - 1})}} \right)}\left\lbrack {1 + \text{ }{\log\frac{p_{Y|X}\left( y_{i,m} \middle| x_{i}^{({t - 1})} \right)}{\sum_{j = 1}^{N}{\pi_{j}^{(t)}{p_{Y|X}\left( y_{i,m} \middle| x_{j}^{({t - 1})} \right)}}}} - {\pi_{i}^{(t)}\frac{p_{Y|X}\left( y_{i,m} \middle| x_{i}^{({t - 1})} \right)}{\sum_{j = 1}^{N}{\pi_{j}^{(t)}{p_{Y❘X}\left( {y_{i,m}❘x_{j}^{({t - 1})}} \right)}}}}} \right\rbrack}.{where}}{h\left( {y_{i,m},x_{i}} \right)}}}}} \right. = {{\frac{\partial}{\partial x_{i}}\log}{{p_{Y❘X}\left( y_{i,m} \middle| x_{i} \right)}.}}$

Using (6), it can be obtained:

${h\left( {y_{i,m},x_{i}} \right)} = {{\frac{\partial}{\partial x_{i}}{\log\left( {\sum\limits_{j = 1}^{K}{w_{j}{g\left( {\left. y_{i,m} \middle| x_{i} \right.;\theta_{j}} \right)}}} \right)}} = \frac{w_{i}{\frac{\partial}{\partial x_{i}}{g\left( {\left. y_{i,m} \middle| x_{i} \right.;\theta_{i}} \right)}}}{\sum_{j = 1}^{K}{w_{j}{g\left( {\left. y_{i,m} \middle| x_{i} \right.;\theta_{j}} \right)}}}}$

Thus, when g(y|x;θ_(j)) is known in a closed form and its derivative isknown in a closed form, the equation can be computed.

Finally the x^((t)) values are obtained and the iteration can continueuntil a stopping condition is met. The stopping condition is for examplean execution time, or if I(x^((t)),π^((t)))−I(x^((t−1)),π^((t−1))) islower than a given threshold, typically small.

An example of algorithm is detailed hereafter, with reference to FIG. 2.

Step S0: Initialization Step

-   -   Step S01: Get the input parameters        -   P, the power limit of the constellation        -   The initial constellation of N symbols        -   The stopping criterion threshold ϵ    -   Step S02: Get the channel conditional probability distribution        in the form P_(Y|X)(y|x)=Σ_(j=1) ^(K) w_(j)g(y|x;θ_(j)), where        K, w_(j) are sacalar parameters and θ_(j) is a parameter set,        and where the expression of

$\frac{\partial}{\partial x_{i}}{g\left( {\left. y \middle| x \right.;\theta_{j}} \right)}$

is known.

-   -   Step S02: Set t=0; Set all π_(i) ⁽⁰⁾=1/N; Set all x_(i) ⁽⁰⁾ from        an initial constellation C0; Set I⁽⁻¹⁾=0; Set t=1

Step S1: Iterative Step t

Step S10: Samples Generation

-   -   S101: For all i in [1,N], generate M samples y_(i,m) from the        distribution p_(Y|X)(y|x=x_(i) ^((t−1)))    -   S102: For all i in [1,N], for all j in [1,N], compute        p_(Y|X)(y_(i,m)|x_(j) ^((t−1)))

Step S11: Compute the Stopping Condition

$I^{({t - 1})} = {\frac{1}{M}{\sum_{i = 1}^{N}{\sum_{m = 1}^{M}{\pi_{i}^{({t - 1})}\log\frac{p_{Y|X}\left( {y_{i,m}❘x_{i}^{({t - 1})}} \right)}{\sum_{j = 1}^{N}{\pi_{j}^{({t - 1})}{p_{Y❘X}\left( {y_{i,m}❘x_{j}^{({t - 1})}} \right)}}}}}}}$

-   -   S111: Compute    -   S112: I^((t−1))−I^((t−2))<ϵ, stop the iterative algorithm        (S113). Otherwise, go to S121.

Step S12: Update the Probabilities π_(i) ^((t))

-   -   S121: For all i in [1,N], and m in [1,M], compute

${q\left( x_{i}^{({t - 1})} \middle| y_{i,m} \right)} = \frac{\pi_{i}^{({t - 1})}{p_{Y|X}\left( {y_{i,m}❘x_{i}^{({t - 1})}} \right)}}{\sum_{j = 1}^{N}{\pi_{j}^{({t - 1})}{p_{Y|X}\left( y_{i,m} \middle| x_{j}^{({t - 1})} \right)}}}$

-   -   S122: Compute β by solving:

${\sum\limits_{i = 1}^{N}{{\exp\left( {{\beta{❘x_{i}^{({t - 1})}❘}^{2}} + {\frac{1}{M}{\sum\limits_{m = 1}^{M}{\log{q\left( {x_{i}^{({t - 1})}❘y_{i,m}} \right)}}}}} \right)}\left\lbrack {P - {❘x_{i}^{({t - 1})}❘}^{2}} \right\rbrack}} = 0$

-   -   For example by using a Newton-Raphson descent, and/or        -   by using a line-search strategy (taking several β values,            computing the above expression and selecting the closest to            0);    -   S123: For all i in [1,N], compute

${\pi_{i}^{(t)} = \frac{\exp\left( {{\beta{❘x_{i}^{({t - 1})}❘}^{2}} + {\frac{1}{M}{\sum_{m = 1}^{M}{\log{q\left( {x_{i}^{({t - 1})}❘y_{i,m}} \right)}}}}} \right)}{\sum_{j = 1}^{N}{\exp\left( {{\beta{❘x_{j}^{({t - 1})}❘}^{2}} + {\frac{1}{M}{\sum_{m = 1}^{M}{\log{q\left( {x_{j}^{({t - 1})}❘y_{i,m}} \right)}}}}} \right)}}},$

Step S2: Update the Symbols x_(i) ^((t)) Position with New π_(i) ^((t))and Previous x_(i) ^((t−1))

-   -   S21: For all i in [1,N], for all j in [1,N], compute

${\frac{\partial}{\partial x_{i}}{g\left( {\left. y_{i,m} \middle| x_{i} \right.;\theta_{i}} \right)}},$

which is obtained from the known expression of

$\frac{\partial}{\partial x_{í}}{g\left( {\left. y \middle| x \right.;\theta_{i}} \right)}$

by substituting y by y_(i,m) and x by x_(i)

-   -   S22: For all i in [1,N], compute

$x_{i}^{(t)} = {x_{i}^{({t - 1})} + {\lambda_{t}\frac{1}{M}{\sum_{m = 1}^{M}{{h\left( {y_{i,m},x_{i}^{({t - 1})}} \right)}\left\lbrack {1 + \text{ }{\log\frac{p_{Y❘X}\left( y_{i,m} \middle| x_{i}^{({t - 1})} \right)}{\sum_{j = 1}^{N}{\pi_{j}^{(t)}{p_{Y|X}\left( {y_{i,m}❘x_{j}^{({t - 1})}} \right)}}}} - {\pi_{i}^{(t)}\frac{p_{Y|X}\left( {y_{i,m}❘x_{i}^{({t - 1})}} \right)}{\sum_{j = 1}^{N}{\pi_{j}^{(t)}{p_{Y|X}\left( y_{i,m} \middle| x_{j}^{({t - 1})} \right)}}}}} \right\rbrack}}}}$

where h(y_(i,m),x_(i) ^((t−1))) is the value of the function

${h\left( {y_{i,m},x_{i}} \right)} = {{\frac{w_{i}{\frac{\partial}{\partial x_{i}}{g\left( {\left. y_{i,m} \middle| x_{i} \right.;\theta_{i}} \right)}}}{\sum_{j = 1}^{K}{w_{j}{g\left( {\left. y_{i,m} \middle| x_{i} \right.;\theta_{j}} \right)}}}{for}x_{i}} = x_{i}^{({t - 1})}}$

Next step S3 is an incrementing of t to loop, for a next iteration, tostep S101.

An artificial intelligence can thus be programmed with such an algorithmto optimize the capacity of a given communication channel (one orseveral communication channels) in a telecommunication network.

1. A method for optimizing a capacity of a communication channel in a communication system comprising at least a transmitter, a receiver, and said communication channel between the transmitter and the receiver, the transmitter using a finite set of symbols Ω={ω₁, . . . , ω_(N)} having respective positions on a constellation, to transmit a message including at least one symbol on said communication channel, the communication channel being characterized by a conditional probability distribution p_(Y|X)(y|x) , where y is the symbol received at the receiver while x is the symbol transmitted by the transmitter, wherein said conditional probability distribution p_(Y|X)(y|x) is obtained, for each possible transmitted symbol x, by a mixture model using probability distributions represented by exponential functions, and an optimized input distribution p_(x)(x) is computed, based on parameters of said mixture model, to define optimized symbols positions and probabilities to be used at the transmitter for optimizing the capacity of the channel.
 2. The method of claim 1, wherein said optimized symbols positions and probabilities are obtained at the transmitter and at the receiver.
 3. The method according to claim 1, wherein the transmitter transmits messages conveyed by a signal belonging to a finite set of signals corresponding respectively to said symbols ω₁, . . . , ω_(N), each signal being associated with a transmission probability according to an optimized input signal probability distribution corresponding to said optimized input distribution p_(x)(x), And the transmitter takes messages to be transmitted and said optimized input signal probability distribution as inputs, and outputs a transmitted signal on the communication channel.
 4. The method according to claim 3, wherein the communication channel takes the transmitted signal as an input, and outputs a received signal intended to be processed at the receiver, said conditional probability distribution p_(Y|X)(y|x) being related thus to a probability of outputting a given signal y when the input x is fixed.
 5. The method according to claim 4, wherein an estimation of said conditional probability distribution p_(Y|X)(y|x) is taken as input, to output the optimized input signal probability distribution p_(x)(x) to be obtained at least at the transmitter the conditional probability distribution estimation being used for computing the optimized input signal probability distribution, the conditional probability distribution estimation being approximated by said mixture model.
 6. The method according to claim 4, wherein the receiver takes the received signal, the optimized input signal probability distribution p_(x)(x) and an estimation of the channel conditional probability distribution p_(Y|X)(y|x) as inputs and performs an estimation of a message conveyed in said received signal.
 7. The method according to claim 1, wherein said mixed model follows a conditional probability distribution p_(Y|X)(y|x) which is decomposable into a basis of probability distributions exponential functions g(y|x;θ), where θ is a parameter set, such that: p _(Y|X)(y|x)=Σ_(j=1) ^(K) w _(j) g(y|x;θ _(j))   (E) where K is a predetermined parameter, the sets {θ_(j)}, {w_(j)} are parameters representing respectively a mean vector coordinates and covariance matrix parameters.
 8. The method of claim 7, wherein the derivative of the probability distributions exponential functions g(y|x;θ) are given by g(y|x;θ)=h(y,θ)exp(x^(T)y−α(x,θ)), where h(y,θ) is a function of y and θ, and α(x,θ) is the moment generating function, x and y being vectors, such that said derivative is given by: ${\frac{\partial}{\partial x}{g\left( {\left. y \middle| x \right.;\theta} \right)}} = {{h\left( {y,\theta} \right)}\left( {y - {\frac{\partial}{\partial x}{a\left( {x,\theta} \right)}}} \right){\exp\left( {{x^{T}y} - {a\left( {x,\theta} \right)}} \right)}}$
 9. The method according to claim 7, wherein said distribution p_(Y|X)(y|x) is approximated by a finite set of continuous functions minimizing a metric defined by Kullback-Leibler divergence, by determining parameters set {θ_(j)},{w_(j)} which minimize the Kullback-Leibler divergence between an analytical observation of p_(Y|X)(y|x) and its expression given by: p _(Y|X)(y|x)=Σ_(j=1) ^(K) w _(j) g(y|x;θ _(j)).
 10. The method according to claim 1, wherein the input distribution p_(x)(x) is represented as a list of N constellation positions as {(x₁,π₁), . . . , (x_(N),π_(N))}, where x_(i) and π_(i) denote respectively constellation positions and probability weights, And wherein said input distribution p_(x)(x) is estimated by solving an optimization problem at the transmitter given by: ${\left( {x^{*},\ \pi^{*}} \right) = {\underset{\hat{x},\pi}{argmax}{I\left( {x,\pi} \right)}}}{{{subject}{to}{\sum_{i = 1}^{N}\pi_{i}}} = 1}{{\sum_{i = 1}^{N}{{❘x_{i}❘}^{2}\pi_{i}}} \leq P}{{0 < \pi_{i} < 1},{{{for}i} = 1},\ldots,N}$ Where: I(x,π) is a mutual information as a function of position vector x=[x₁, . . . , x_(N)]^(T) and weight vector π=[π₁, . . . , π_(N)]^(T), optimal values are tagged with the superscript *, and P denotes a total transmit power.
 11. The method of claim 10, wherein said mixed model follows a conditional probability distribution p_(Y|X)(y|x) which is decomposable into a basis of probability distributions exponential functions g(y|x;θ), where θ is a parameter set, such that: p _(Y|X)(y|x)=Σ_(j=1) ^(K) w _(j) g(y|x;θ _(j))   (E) where K is a predetermined parameter, the sets {θ_(j)},{w_(j)} are parameters representing respectively a mean vector coordinates and covariance matrix parameters; and wherein the mutual information is expressed as: ${{I\left( {x,\pi} \right)} = {\frac{1}{M}{\sum_{i = 1}^{N}{\sum_{m = 1}^{M}{\pi_{i}\log\frac{p_{Y|X}\left( y_{i,m} \middle| x_{i} \right)}{\sum_{j = 1}^{N}{\pi_{j}{p_{Y|X}\left( y_{i,m} \middle| x_{j} \right)}}}}}}}},{{{where}{p_{Y|X}\left( y \middle| x \right)}} = {\sum_{j = 1}^{K}{w_{j}{g\left( {\left. y \middle| x \right.;\theta_{j}} \right)}}}},$ and argument y_(i,m) are samples from the distribution p_(Y|X)(y|x_(i)).
 12. The method of claim 11, wherein an alternating optimization is performed iteratively to calculate both p_(x)(x) and p_(Y|X)(y|x)=Σ_(j=1) ^(K) w_(j)g(y|x;θ_(j)), so as to derive from said calculations optimized positions π^((t)) described from a preceding iteration t−1 to a current iteration t as follows: at first, positions π^((t)) are optimized for a fixed set of symbol positions x^((t−1)) and previous position values π^((t−1)); then, symbol positions x^((t)) are optimized for the thusly determined π^((t)) and previous values of x^((t−1)), And repeating iteratively these two steps until a stopping condition occurs on the mutual information I(x,π).
 13. A computer program comprising instructions causing a processing circuit to implement the method as claimed in claim 1, when such instructions are executed by the processing circuit.
 14. A system comprising at least a transmitter, a receiver, and a communication channel between the transmitter and the receiver, wherein the transmitter at least is configured to implement the method according to claim
 1. 15. A communication device comprising a processing circuit configured to perform the optimization method according to claim
 1. 